Clustering with or without the Approximation

نویسندگان

  • Frans Schalekamp
  • Michael Yu
  • Anke van Zuylen
چکیده

We study algorithms for clustering data that were recently proposed by Balcan, Blum and Gupta in SODA’09 [4] and that have already given rise to two follow-up papers. The input for the clustering problem consists of points in a metric space and a number k, specifying the desired number of clusters. The algorithms find a clustering that is provably close to a target clustering, provided that the instance has the “(1 +α, ε)-property”, which means that the instance is such that all solutions to the k-median problem for which the objective value is at most (1 + α) times the optimal objective value correspond to clusterings that misclassify at most an ε fraction of the points with respect to the target clustering. We investigate the theoretical and practical implications of their results. Our main contributions are as follows. First, we show that instances that have the (1+α, ε)-property and for which, additionally, the clusters in the target clustering are large, are easier than general instances: the algorithm proposed in [4] is a constant factor approximation algorithm with an approximation guarantee that is better than the known hardness of approximation for general instances. Further, we show that it is NP hard to check if an instance satisfies the (1 + α, ε)-property for a given (α, ε); the algorithms in [4] need such α and ε as input parameters, however. We propose ways to use their algorithms even if we do not know values of α and ε for which the assumption holds. Finally, we implement these methods and other popular methods, and test them on real world data sets. We find that on these data sets there are no α and ε so that the dataset has both (1 +α, ε)-property and sufficiently large clusters in the target solution. For the general case, we show that on our data sets the performance guarantee proved by [4] is meaningless for the values of α, ε such that the data set has the (1 + α, ε)-property. The algorithm nonetheless gives reasonable results, although it is outperformed by other

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Ant-Colony Optimization Clustering Model for Cellular Automata Routing in Wireless Sensor Networks

High efficient routing is an important issue for the design of wireless sensor network (WSN) protocols to meet the severe hardware and resource constraints. This paper presents an inclusive evolutionary reinforcement method. The proposed approach is a combination of Cellular Automata (CA) and Ant Colony Optimization (ACO) techniques in order to create collision-free trajectories for every agent...

متن کامل

Signal processing approaches as novel tools for the clustering of N-acetyl-β-D-glucosaminidases

Nowadays, the clustering of proteins and enzymes in particular, are one of the most popular topics in bioinformatics. Increasing number of chitinase genes from different organisms and their sequences have beenidentified. So far, various mathematical algorithms for the clustering of chitinase genes have been used butmost of them seem to be confusing and sometimes insufficient. In the...

متن کامل

A Two Level Approximation Technique for Structural Optimization

This work presents a method for optimum design of structures, where the design variables can he considered as Continuous or discrete. The variables are chosen as sizing variables as well as coordinates of joints. The main idea is to reduce the number of structural analyses and the overal cost of optimization. In each design cycle, first the structural response quantities such as forces, displac...

متن کامل

Estimation of Software Reliability by Sequential Testing with Simulated Annealing of Mean Field Approximation

Various problems of combinatorial optimization and permutation can be solved with neural network optimization. The problem of estimating the software reliability can be solved with the optimization of failed components to its minimum value. Various solutions of the problem of estimating the software reliability have been given. These solutions are exact and heuristic, but all the exact approach...

متن کامل

Verification and Validation of Common Derivative Terms Approximation in Meshfree Numerical Scheme

In order to improve the approximation of spatial derivatives without meshes, a set of meshfree numerical schemes for derivative terms is developed, which is compatible with the coordinates of Cartesian, cylindrical, and spherical. Based on the comparisons between numerical and theoretical solutions, errors and convergences are assessed by a posteriori method, which shows that the approximations...

متن کامل

Variability of the Cyclin-Dependent Kinase 2 Flexibility Without Significant Change in the Initial Conformation of the Protein or Its Environment; a Computational Study

Background: Protein flexibility, which has been referred as a dynamic behavior has various roles in proteins’ functions. Furthermore, for some developed tools in bioinformatics, such as protein-protein docking software, considering the protein flexibility, causes a higher degree of accuracy. Through undertaking the present work, we have accomplished the quantification plus analysis of the varia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Comb. Optim.

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2010